Peripatric speciation within Torreya fargesii (Taxaceae) in the Hengduan Mountains inferred from multi-loci phylogeography

Background The Hengduan Mountains (HDM) are one of the major global biodiversity hotspots in the world. Several evolutionary scenarios, especially in-situ diversification, have been proposed to account for the high species richness of temperate plants. However, peripatric speciation, an important mode of allopatric speciation, has seldom been reported in this region. Results Here, two chloroplast DNA regions and 14 nuclear loci were sequenced for 112 individuals from 10 populations of Torreya fargesii var. fargesii and 63 individuals from 6 populations of T. fargesii var. yunnanensis. Population genetic analyses revealed that the two varieties are well differentiated genetically (FST, 0.5765) and have uneven genetic diversity (π, 0.00221 vs. 0.00073 on an average of nuclear loci). The gene genealogical relationship showed that T. fargesii var. yunnanensis is inferred as derived from T. fargesii var. fargesii, which was further supported by the coalescent simulations (DIYABC, fastsimcoal2 and IMa2). By the coalescent simulations, the divergence time (~ 2.50–3.65 Ma) and the weak gene flow between the two varieties were detected. The gene flow was asymmetrical and only occurred in later stages of divergence, which is caused by second contact due to the population expansion (~ 0.61 Ma) in T. fargesii var. fargesii. In addition, niche modeling indicated that the two varieties are differentiated geographically and ecologically and have unbalanced distribution range. Conclusions Overall, T. fargesii var. fargesii is always parapatric with respect to T. fargesii var. yunnanensis, and the latter derived from the former in peripatry of the HDM following a colonization from central China during the late Pliocene. Our findings demonstrate that peripatric speciation following dispersal events may be an important evolutionary scenario for the formation of biodiversity hotspot of the HDM. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-023-02183-1.

Additional file 1: Table S1.Variable sites in seven haplotypes from two chloroplast DNA regions, trnL-trnF and rpoB-trnC.Table S2.Sampling information, the distribution of chloroplast haplotypes, and chloroplast genetic diversity of T. fargesii var.fargesii and T. fargesii var.yunnanensis.Table S3.Nucleotide diversity and haplotype diversity across 14 nuclear loci in T. fargesii var.fargesii and T. fargesii var.
yunnanensis.Table S4.Neutrality test across 14 nuclear loci in T. fargesii var.fargesii and T. fargesii var.yunnanensis.Table S5.Genetic differentiation (FST) between T. fargesii var.fargesii and T. fargesii var.yunnanensis for each nuclear locus and across all loci.Table S6.Posterior probabilities of four basic scenarios (A1, B1, C1 and D1 in Figure S5) modeled by DIYABC based on the first dataset (14 nuclear loci).The scenario with highest posterior probability was shown in bold.Table S7.Posterior estimates of demographic parameters for the best scenario (B1 in Table S6) revealed by DIYABC based on the first dataset (14 nuclear loci).Table S8.Posterior probabilities of four basic scenarios (A1, B1, C1 and D1 in Figure S5) modeled by DIYABC based on the second dataset (12 nuclear loci).Table S9.Posterior estimates of demographic parameters for the best scenario (B1 in Table S8) revealed by DIYABC based on the second dataset (12 nuclear loci).Table S10.The maximum likelihood estimates for each scenario in Figure S5 calculated by fastsimcoal2 based on the first dataset (14 nuclear loci).The best fitting scenario was shown in bold.Table S11.Posterior estimates of demographic parameters for the best scenario (B3 in Table S10) simulated by fastsimcoal2 based on the first dataset (14 nuclear loci).Table S12.The maximum likelihood estimates for each scenario in Figure S5 calculated by fastsimcoal2 based on the second dataset (12 nuclear loci).The best fitting scenario was shown in bold.Table S13.Posterior estimates of demographic parameters for the best scenario (B1 in Table S12) simulated by fastsimcoal2 based on the second dataset (12 nuclear loci).Table S14.Maximum likelihood estimates (MLE) and 95% highest posterior density (HPD) intervals of demographic parameters estimated in IMa2 based on the first dataset (14 nuclear loci).Table S15.Maximum likelihood estimates (MLE) and 95% highest posterior density (HPD) intervals of demographic parameters estimated in IMa2 based on the second dataset (12 nuclear loci).Table S16.Transcriptome sequences from T. grandis used to develop primers in this study.Table S17.Function annotations for 14 transcriptome sequences against four protein databases (Nr, GO, KO and Swiss-Prot).
Table S18.Primer sequences, annealing temperatures, PCR products sizes, and gene ID of 14 nuclear loci used in this study.Table S19.The prior distribution of parameters for all scenarios in the simulations of DIYABC and fastsimcoal2.Table S20.The geographical records of T. fargesii var.fargesii and T. fargesii var.yunnanensis used in the ecological niche modeling.Table S21.Pairwise Pearson correlation coefficients (r) of 20 environmental variables.Figure S1.Bayesian tree (left) and Neighbor-joining (right) tree were constructed by the concatenated nuclear dataset using MrBayes and MEGA, respectively.Support values were showed above the branches.Figure S2.
Phylogenetic trees were inferred using BEAST based on the partitioned nuclear datasets, the first dataset (14 nuclear loci) (left) and the second dataset (12 nuclear loci) (right).

Figure S3 .
The most likely number of clusters (K) inferred with LnP(D) (left) and ΔK (right) statistics implemented in STRUCTURE.Figure S4.Posterior probability distributions of effective population size (θ), migration rate (m) and divergence time (t) between two varieties estimated separately using IM model based on the first dataset (14 nuclear loci) (left) and the second dataset (12 nuclear loci) (right).

Figure S5 .
Four basic scenarios (A1, B1, C1 and D1) with different migration models for the divergence and demographic history of T. fargesii var.fargesii and T. fargesii var.yunnanensis.NF and NY represent the current population sizes of T. fargesii var.fargesii and T. fargesii var.yunnanensis, and N1 and N2 represent the population sizes between ancestral population and current population of T. fargesii var.yunnanensis and T. fargesii var.fargesii, respectively.NA represents the ancestral population size.t0, t1, and t2 represent the time of population changes and t3 the divergent time.A2-A5, B2-B5, C2-C5, and D2-D5 are the derivatives of A1, B1, C1, and D1 by adding migration parameters at different times, respectively.Figure S6.Posterior probabilities for four basic scenarios (A1, B1, C1 and D1 in Figure S5) (A), model checking for the optimum scenario B1 (B), and level of confidence in scenario choice (including type I error and type II error) (C) estimated using direct approach and logistic regression in DIYABC based on the first dataset (14 nuclear loci).Figure S7.Posterior probabilities for four basic scenarios (A1, B1, C1 and D1 in Figure S5) (A), model checking for the optimum scenario B1 (B), and level of confidence in scenario choice (including type I error and type II error) (C) estimated using direct approach and logistic regression in DIYABC based on the second dataset (12 nuclear loci).Figure S8.Bayesian skyline plot inferred for T. fargesii var.fargesii and T. fargesii var.yunnanensis in BEAST based on 14 nuclear loci.The bold and thin lines are the median posterior and 95% highest posterior densities of effective population size through time, respectively.The effective population size and time were not scaled using mutation rate.Figure S9.The ROC curve (AUC) for each predicted distribution, present-day, Mid-Holocene (MH, under MIROC and CCSM models), the Last Glacial Maximum (LGM, under MIROC and CCSM models), and the Last Interglacial (LIG) climatic periods.Figure S10.Climate niches during Mid-Holocene (MH) and the Last Glacial Maximum (LGM) under CCSM model were modeled and drawn using MAXENT 3.4.3for T. fargesii var.fargesii and T. fargesii var.yunnanensis.

Figure S1
Figure S1 Bayesian tree (left) and Neighbor-joining (right) tree were constructed by the concatenated nuclear dataset using MrBayes and MEGA, respectively.Support values were showed above the branches.

Figure S2
Figure S2 Phylogenetic trees were inferred using BEAST based on the partitioned nuclear datasets, the first dataset (14 nuclear loci) (left) and the second dataset (12 nuclear loci) (right).Posterior values of all clades are lower than 0.5.

Figure S3
Figure S3The most likely number of clusters (K) inferred with LnP(D) (left) and ΔK (right) statistics implemented in STRUCTURE.

Figure S4
Figure S4 Posterior probability distributions of effective population size (θ), migration rate (m) and divergence time (t) between two varieties estimated separately using IM model based on the first dataset (14 nuclear loci) (left) and the second dataset (12 nuclear loci) (right).

Figure S5
Figure S5 Four basic scenarios (A1, B1, C1 and D1) with different migration models for the divergence and demographic history of T. fargesii var.fargesii and T. fargesii var.yunnanensis.NF and NY represent the current population sizes of T. fargesii var.fargesii and T. fargesii var.yunnanensis, and N1 and N2 represent the population sizes between ancestral population and current population of T. fargesii var.yunnanensis and T. fargesii var.fargesii, respectively.NA represents the ancestral population size.t0, t1, and t2 represent the time of population changes and t3 the divergent time.A2-A5, B2-B5, C2-C5, and D2-D5 are the derivatives of A1, B1, C1, and D1 by adding migration parameters at different times, respectively.

Figure S6
Figure S6 Posterior probabilities for four basic scenarios (A1, B1, C1 and D1 in Figure S5) (A), model checking for the optimum scenario B1 (B), and level of confidence in scenario choice (including type I error and type II error) (C) estimated using direct approach and logistic regression in DIYABC based on the first dataset (14 nuclear loci).

Figure S7
Figure S7 Posterior probabilities for four basic scenarios (A1, B1, C1 and D1 in Figure S5) (A), model checking for the optimum scenario B1 (B), and level of confidence in scenario choice (including type I error and type II error) (C) estimated using direct approach and logistic regression in DIYABC based on the second dataset (12 nuclear loci).

Figure S8
Figure S8 Bayesian skyline plot inferred for T. fargesii var.fargesii and T. fargesii var.yunnanensis in BEAST based on 14 nuclear loci.The bold and thin lines are the median posterior and 95% highest posterior densities of effective population size through time, respectively.The effective population size and time were not scaled using mutation rate.

Figure S10
Figure S10 Climate niches during Mid-Holocene (MH) and the Last Glacial Maximum (LGM) under CCSM model were modeled and drawn using MAXENT 3.4.3for T. fargesii var.fargesii and T. fargesii var.yunnanensis.

Table S1
Variable sites in seven haplotypes from two chloroplast DNA regions, trnL-trnF and rpoB-trnC.

Table S2
Sampling information, the distribution of chloroplast haplotypes, and chloroplast genetic diversity of T. fargesii var.fargesii and T. fargesii var.yunnanensis.
D, Tajima's D statistic; D* and F*, Fu and Li's D* and Fu and Li's F*; H, Fay and Wu's H; MFDM, maximum frequency of derived mutations test; -, failed to be computed for lack of enough variation.*, significant level at P < 0.05.

Table S5
Genetic differentiation (FST) between T. fargesii var.fargesii and T. fargesii var.yunnanensis for each nuclear locus and across all loci.

Table S6
Posterior probabilities of four basic scenarios (A1, B1, C1 and D1 in FigureS5) modeled by DIYABC based on the first dataset (14 nuclear loci).The scenario with highest posterior probability was shown in bold.

Table S7
Posterior estimates of demographic parameters for the best scenario (B1 in TableS6) revealed by DIYABC based on the first dataset (14 nuclear loci).
The model with highest posterior probability was shown in bold font.

Table S9
Posterior estimates of demographic parameters for the best scenario (B1 in TableS8) revealed by DIYABC based on the second dataset (12 nuclear loci).

Table S10
The maximum likelihood estimates for each scenario in FigureS5calculated by fastsimcoal2 based on the first dataset (14 nuclear loci).The best fitting scenario was shown in bold.

Table S11
Posterior estimates of demographic parameters for the best scenario (B3 in TableS10) simulated by fastsimcoal2 based on the first dataset (14 nuclear loci).

Table S12
The maximum likelihood estimates for each scenario in FigureS5calculated by fastsimcoal2 based on the second dataset (12 nuclear loci).The best fitting scenario was shown in bold.

Table S13
Posterior estimates of demographic parameters for the best scenario (B1 in TableS12) simulated by fastsimcoal2 based on the second dataset (12 nuclear loci).

Table S14
Maximum likelihood estimates (MLE) and 95% highest posterior density (HPD) intervals of demographic parameters estimated in IMa2 based on the first dataset (14 nuclear loci).F and N F , effective population size of T. fargesii var.fargesii; θ Y and N Y , effective population size of T. fargesii var.yunnanensis.m Y>F and m F>Y , population migration rate θ from T. fargesii var.yunnanensis to T. fargesii var.fargesii and T. fargesii var.fargesii to T. fargesii var.yunnanensis; 2Nm (F) and 2Nm (Y), population migration rate for T. fargesii var.fargesii and T. fargesii var.yunnanensis; t and T, divergent time between T. fargesii var.yunnanensis and T. fargesii var.fargesii.θ, m and t are scaled by the mutation rate, while N, 2Nm and T are scaled by individuals or years.

Table S15
Maximum likelihood estimates (MLE) and 95% highest posterior density (HPD) intervals of demographic parameters estimated in IMa2 based on the second dataset (12 nuclear loci).F and N F , effective population size of T. fargesii var.fargesii; θ Y and N Y , effective population size of T. fargesii var.yunnanensis.m Y>F and m F>Y , population migration rate θ from T. fargesii var.yunnanensis to T. fargesii var.fargesii and T. fargesii var.fargesii to T. fargesii var.yunnanensis; 2Nm (F) and 2Nm (Y), population migration rate for T. fargesii var.fargesii and T. fargesii var.yunnanensis; t and T, divergent time between T. fargesii var.yunnanensis and T. fargesii var.fargesii.θ, m and t are scaled by the mutation rate, while N, 2Nm and T are scaled by individuals or years.

Table S16
Transcriptome sequences from T. grandis used to develop primers in this study.

Table S17
Function annotations for 14 transcriptome sequences against four protein databases (Nr, GO, KO and Swiss-Prot).

Table S18
Primer sequences, annealing temperatures, PCR products sizes, and gene ID of 14 nuclear loci used in this study.

Table S20
The geographical records of T. fargesii var.fargesii and T. fargesii var.yunnanensis used in the ecological niche modeling.

Table S21
Pairwise Pearson correlation coefficients (r) of 20 environmental variables.